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Abstract 

Background: Mapping of DNase I hypersensitive sites (DHSs) is a powerful tool to experimentally identify cis- 
regulatory elennents (CREs). Annong CREs, enhancers are abundant and predominantly act in driving cell-specific 
gene expression. Kruppel-like factors (KLFs) are a family of eukaryotic transcription factors. Several KLFs have been 
demonstrated to play important roles in hematopoiesis. However, transcriptional regulation of KLFs via CREs, 
particularly enhancers, in erythroid cells has been poorly understood. 

Results: In this study, 23 erythroid-specific or putative erythroid-specific DHSs were identified by DNase-seq in the 
genomic regions of 17 human KLFs, and their enhancer activities were evaluated using dual-luciferase reporter 
(DLR) assay. Of the 23 erythroid-specific DHSs, the enhancer activities of 15 DHSs were comparable to that of the 
classical enhancer HS2 in driving minimal promoter (minP). Fifteen DHSs, some overlapping those that increased 
minP activities, acted as enhancers when driving the corresponding KLF promoters (KLF-Ps) in erythroid cells; 
of these, 10 DHSs were finally characterized as erythroid-specific KLF enhancers. These 10 erythroid-specific KLF 
enhancers were further confirmed using chromatin immunoprecipitation coupled to sequencing (ChlP-seq) 
data-based bioinformatic and biochemical analyses. 

Conclusion: Our present findings provide a feasible strategy to extensively identify gene- and cell-specific 
enhancers from DHSs obtained by high-throughput sequencing, which will help reveal the transcriptional 
regulation and biological functions of genes in some specific cells. 
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Background 

Biological processes such as proliferation, apoptosis, dif- 
ferentiation, development, and aging require elaborately 
orchestrated spatial and temporal gene expression, 
which are often under the control of c/5-regulatory ele- 
ments (CREs). CREs, including promoters, enhancers, si- 
lencers, insulators and locus control regions (LCRs) etc., 
are abundant in the human genome [1]. Characterization of 
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CREs in the genome contributes to understand the com- 
plexities of gene transcription and expression in different 
biological systems [2,3]. In the past decade, the project 
ENCyclopedia of DNA Elements (ENCODE) has facilitated 
the prediction of functional elements including CREs in the 
human genome [4]. However, CRE characterization in 
terms of gene and cell specificities as well as chromatin 
context dependency remains a huge challenge. 

Nucleosome-depleted DNA regions, characterized by 
their sensitivity to nuclease digestion, are closely associated 
with almost all known classes of active CREs. In contrast, 
DNA regions tightly wrapped in nucleosomes and higher- 
order structures are more resistant to nuclease digestion. 
Therefore, DNase I hypersensitive sites (DHSs) mark many 
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types of CREs [2,5]. Individual DHSs within small regions 
of the genome (10-20 kb) have been traditionally identified 
using Southern blot analysis [6]. However, this labor- 
intensive and low-throughput approach cannot be scaled to 
study large chromosomal regions and entire genomes. 
By hybridizing DNase I-digested fragments to tiled 
microarrays, the DNase-chip assay provides an undirected, 
unbiased, highly sensitive and specific strategy to simultan- 
eously identify thousands of DHSs within any region of 
interest or even the entire genome, with a resolution of 
200-500 bases [7]. Furthermore, DNase-seq (identification 
of DNase I-digested fragments by next-generation sequen- 
cing) allows genome-wide mapping of DHSs with base-pair 
resolution [2]. Using these high-throughput technologies, 
DHS mapping is emerging as a powerful tool for locating 
open chromatin regions that encompassing many types of 
CREs within the genome [2,3,8] and thus it facilitates 
the delineation of the roles of DHSs in regulating 
tissue- and developmental stage-specific expression of 
nearby genes [9,10]. 

Enhancers are the most variable CREs that can regu- 
late the expression of genes from a long distance and in 
a position- and orientation-independent manner [11]. In 
general, it is accepted that enhancers function by first 
recruiting sequence-specific transcription factors (TPs) 
that recognize short DNA motifs within the enhancers. 
Upon binding to enhancers, the sequence-specific TPs 
recruit mediator complexes, histone modifiers and chro- 
matin remodelers to activate the transcription of target 
genes [12]. Enhancers often exist in a cell- and develop- 
mental stage-specific manner [13], and the distribution 
of cell-specific enhancers correlates well with cell- 
specific gene expression [14], suggesting that they are 
the primary force driving spatial- or temporal-specific 
gene expression. To date, several lines of evidences have 
demonstrated the roles of erythroid-specific enhancers 
in driving erythroid-specific gene expression. LCR at the 
|3-globin locus is the most prominent erythroid enhancer 
that exerts a strong effect specifically on erythroid cells. 
This LCR enhances the developmental stage-specific ex- 
pression of globin genes and the expression of linked 
heterogeneous non-globin genes in erythroid cells by 
interacting with respective promoters [15]. HS2, a clas- 
sical enhancer located in LCR, appears to be functional 
in erythroid cells at both embryonic and adult develop- 
mental stages, suggesting its crucial roles in the activa- 
tion of globin genes in erythroid cells throughout 
ontogenesis [16]. Other erythroid-specific enhancers 
have also been found in the genomic regions of GATA- 
1, stem cell leukemia {SCL), L-type pyruvate kinase and 
5-aminolevulinate synthase 2 {ALAS2) genes [17-20], 
which may contribute to the restricted expression of 
these genes in the erythroid lineage. A strikingly large 
number of enhancers have been systematically identified 



in erythroid K562 cells using chromatin immunoprecipi- 
tation followed by genome tiling array (ChlP-chip) ana- 
lysis [14]. The characterization of these and other 
erythroid or erythroid-specific enhancers in the human 
genome will facilitate the understanding of regulation 
and functions of associated genes in erythroid cells. 

Kriippel-like factors (KLPs) are a subfamily of zinc- 
finger proteins that contain three tandem Cys2His2 zinc 
fingers at the highly conserved carboxyl terminus. KLPs 
are important components of the eukaryotic cellular 
transcriptional machinery. By regulating the expression 
of several genes driven by GC-rich or CACCC- 
containing promoters, KLPs participate in many bio- 
logical processes, including hematopoiesis, adipogenesis, 
stem cell maintenance, and tumorigenesis [21]. In par- 
ticular, several KLPs have been demonstrated to play 
crucial roles in erythroid differentiation. Globin genes, 
including a-, 8-, y- and p-globin genes, are prominent 
biomarkers in erythroid cells, and their spatial and tem- 
poral expression is closely correlated with erythroid dif- 
ferentiation and development [22]. KLPl (EKLP), an 
erythroid-specific TP, activates adult |3-globin gene 
expression [23] and regulates gene switching from y- to 
|3-globin [24] as well as definitive hematopoiesis. 
Other KLPs mainly play significant roles in primitive 
hematopoiesis. Por example, KLP4 activates the expres- 
sion of a- and y-globin genes [25,26]. Expression of 
embryonic 8- and fetal y-globin genes is stimulated by 
KLP2 [27], KLPll [28,29], and KLP13 [30] but is 
suppressed by KLP3 [31] and KLP8 [32]. In addition, 
KLP6 [33] and KLPl 7 [34] are required for primitive 
hematopoiesis. Hematopoietic defects or anemia have 
been observed in several /<r//-knockout mice, including 
mice lacing Klfl [35], Klf2 [27], Klj3 [36], Klf6 [33], and 
KlflS [37]. Interestingly, cross-regulation among KLPs 
has been reported during erythropoiesis, erythroid 
differentiation, and globin gene regulation [36,38,39]. 
However, till date, few studies have been conducted to 
investigate the c/5-transcriptional regulation of KLPs by 
erythroid-specific enhancers, with the exception of 
the study on murine KLPl enhancers [40]. Therefore, 
characterization of erythroid-specific enhancers will shed 
light on molecular mechanisms that regulate transcription, 
expression, and functions of KLPs in erythroid cells. 

Here, we characterized gene- and cell-specific en- 
hancers in the genomic regions of human KLPs exten- 
sively by combining high-throughput sequencing data as 
well as biochemical and bioinformatic analyses. Our 
mRNA-seq data in human embryonic stem cells (HESC) 
and three primary erythroid cell types demonstrated that 
human KLPs, including KLPl, KLP3, KLP6, KLP9, 
KLPIO, KLPll, KLP13, and KLP16, were up-regulated in 
erythroid cells as compared to HESC. We also mapped 
DHSs in the genomic regions of 17 human KLPs across 
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four erythroid and seven non-erythroid cell types and 
screened out 23 erythroid-specific or putative erythroid- 
specific DHSs. Using the dual-luciferase reporter (DLR) 
assay, we identified 10 (43%) erythroid-specific enhancers 
embedded in the genomic regions of KLFl, KLF6, KLF9, 
and KLF13. The nature of these identified erythroid- 
specific enhancers was confirmed by a series of bioinfor- 
matic and biochemical analyses, contributing to understand 
the mechanism by which KLFs are regulated in erythroid 
cells. Our present findings provide a feasible strategy to 
characterize cell- and gene-specific enhancers from DHSs 
generated from high-throughput sequencing across various 
cell types, and to facilitate the illustration of transcriptional 
regulation and functions of genes in specific cell types. 



Results 

Expression of some KLFs is up-regulated in erythroid cells 

Expression patterns of KLFs were obtained from the 
mRNA-seq dataset that was originally designed to explore 
the dynamic transcriptomes during human erythroid differ- 
entiation and development (Yang Y, Wang H, Chang KH, 
Qu H, Zhang Z, Xiong Q, Qi H, Cui P, Lin Q, Ruan X, et 
al: Transcriptome dynamics during human erythroid differ- 
entiation and development, submitted). The following cell 
types were examined: undifferentiated HESC, embryonic 
stem cells-derived erythroid cells (ESER), fetal liver-derived 
erythroid cells (FLER), and adult mobilized peripheral 
blood CD34+ cells-derived erythroid cells (PEER). As 
shown in Figure lA, of the 17 KLFs examined, the 
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Figure 1 Differential expression patterns of KLFs in HESC and primary erytliroid cells. (A) Gene expression profile of KLFl -17 in undifferentiated 
HESC and tliree primary erytliroid cells (ESER, FLER, and PBER) examined using mRNA-seq analysis. Gene expression intensity was calculated by normalizing 
the read counts to reads per kilobase of the exon model per million mapped reads (RPKM) according to the gene length and total mapped reads. KLF2 
and KLF14 were not detected and thus were not shown in this figure, whereas KLFlOa and 10b represented two KLFIO isoforms. (B) Gene expression 
profile of KLFl, 3, 6, 9, 10, 11, 13, and 16 in undifferentiated HESC and three primary erythroid cells examined using quantitative real-time PCR. Transcript 
levels of KLFs were calculated in relation to that of 18S ribosomal RNA, and the expression levels of KLFs in ESER, FLER, and PBER were normalized to those 
in HESC. The error bars above each column indicate standard error of the mean (SEM) between triplicates. The Y-axis breaks at 3. Asterisks indicate that the 
differences between the levels of individual transcripts in erythroid cells (with Y values ranging from 1 to 3) and those in HESC were statistically significant 
by independent-samples t-test, ***p < 0.001, """p < 0.005. (C) Gene expression profile of KLF2, 4, 5, 7, 8, 12, 14, 15, and 17 in undifferentiated HESC and 
three primary erythroid cells examined using quantitative real-time PCR. 
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expressions of KLF2 and KLF14 were not detected, whereas 
KLF17 was poorly expressed in ESER. The expressions of 
KLF4, KLF5, KLF7, KLF8, KLF12, and KLF15 in HESC 
were higher than those in erythroid cells, whereas those of 
the remaining eight KLFs were higher in erythroid cells 
than in HESC. The expressions of all the 17 KLFs were fur- 
ther evaluated using quantitative real-time PGR. With the 
exception of KLF2 and KLF15, the expression patterns of 
the remaining 15 KLFs measured by PGR correlated well 
with mRNA-seq results (Figure IB, G). The cause of the in- 
consistency in KLF2 and KLF15 measurement is currently 
not clear but it could be platform related. Nevertheless, we 
proposed that the higher expressions of the eight KLFs 
detected by both two platforms may be attributed to the 
presence of erythroid-specific enhancers. 

DHSs in KLF genomic regions are distributed diversely in 
various cell types 

The DNase-seq dataset used in this study was generated 
by the University of Washington [41]. The dataset is 
composed of DNase-seq data in four erythroid cell types, 
including three primary erythroid cells (ESER, FLER, 
and PEER) and erythroleukemia K562 cells, to cover all 



the possible DHSs in the erythroid lineage and seven 
non-erythroid cell types, including HESG, GM12878, 
hTH2, HeLa, HepG2, GAG02, and BJ, to differentiate 
erythroid-specific DHSs from non-erythroid ones. A 
false discovery rate (FDR) threshold of 0.5% was used to 
define DHSs in each cell type. DHS mapping was pro- 
filed for all the KLF gene loci from 70 kb upstream of 
the transcription start sites (TSSs) to 20 kb downstream 
of the poly (A) sites. The current coverage of gene loci 
was determined based on the following reasons: First, 
the regions covering approximately 100 kb encompassed 
almost all intensive DHSs around the corresponding 
KLFs in the four cell types studied (Figure 2, Additional 
file 1: Figures SI and S2). Second, GTGF binding is 
reported to mark boundary elements between neighbor- 
ing genes [42]. These approximate 100-kb regions con- 
tain such ubiquitous GTGF-binding sites in the various 
cell types employed in the present study (data from 
UGSG Browser). The diverse distribution patterns of 
KLF DHSs among the various established cell types and 
HESG were shown in Figure 2, Additional file 1: Figures 
SI and S2, coinciding with the varying expression levels 
of KLF genes in these cell types. DHSs were considered 
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Figure 2 Distribution of DHSs in the genomic regions of KLFl and KLF9 genes. Chromatin profiles for KLFl (A) and KLF9 (B) are sliown to 
illustrate the distribution of DHSs in the genomic regions of KLF genes in four erythroid (in color) and seven non-erythroid (in black) cell lines. 
Erythroid-specific or putative erythroid-specific DHSs were named with Roman numbers. Erythroid-specific DHSs, which were present only in 
human erythroid cells, were indicated by arrows and columns in green. Putative erythroid-specific DHSs, which were also present in non- 
erythroid cell types but at lower intensities, were indicated by arrows and columns in blue. Therefore, KLFl -I, II, III, and IV (A) were considered as 
erythroid-specific DHSs, of which KLFl -I was located in the intron region of the RAD23A gene within the defined genomic region, and KLFl-V was 
a putative erythroid-specific DHS because of its presence in HeLa cells. Similarly, upstream KLF9-I and intronic KLF9-II were considered as putative 
erythroid-specific sites (B). 
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to be erythroid specific if they were only present in 
erythroid cells and were classified as putative erythroid 
specific if they were present in erythroid cells, while 
much subdued peaks were also detected in one or two 
non-erythroid cell types. 

Figure 2 A illustrated the DHS profiling of KLFl. Five 
prominent DHSs were detected in the KLFl genomic re- 
gion; of these KLFl -I was located at > 60 kb upstream of 
the KLFl gene and was only present in three primary 
erythroid cells, whereas KLFl -II, III, IV, and V were lo- 
cated proximal to the KLFl gene and were present in 
both primary erythroid cells and erythroleukemia K562 
cells. KLFl-V was a putative erythroid- specific site 
because a small peak for this site was also present in 
non-erythroid HeLa cells. The diverse cellular presenta- 
tion of DHSs was also observed in the profile of KLF9 
(Figure 2B), with two putative erythroid-specific DHSs— 
KLF9-I and KLF9-II— located 70 kb upstream or in the 
intron of the gene respectively. 

Other erythroid-specific or putative erythroid-specific 
sites in the KLF profiles are shown in Additional file 1: 
Figure SI. Erythroid-specific DHSs include DHS -I 
oiKLF2 (Additional file 1: Figure SIA), DHS-I and II of 
KLF3 (Additional file 1: Figure SIB), DHS-III and IV of 
KLF6 (Additional file 1: Figure SIC), DHS-I of KLFIO 
(Additional file 1: Figure SID), DHS-I and II of KLF13 
(Additional file 1: Figure SIF), DHS-II of KLF16 (Add- 
itional file 1: Figure SIG), and DHS-I of KLF 17 (Add- 
itional file 1: Figure SIH). In addition, DHS-III of KLF3 
(Additional file 1: Figure SIB), DHS-I and II of KLF6 
(Additional file 1: Figure SIC), DHS-I of KLFll (Add- 
itional file 1: Figure SIE), DHS-III of KLF13 (Additional 
file 1: Figure SIF), and DHS-I of KLF16 (Additional 
file 1: Figure SIG) were identified as putative erythroid- 
specific DHSs. The features of all the 23 erythroid- 
specific or putative erythroid-specific DHSs located in 
the KLF gene loci are summarized in Additional file 2: 
Table SI. DHS profiles of KLFs without erythroid- 
specific or putative erythroid-specific DHSs are shown 
in Additional file 1: Figure S2. It is also of interest to 
note that while KLF4 has been employed as a major re- 
programming factor required to reverse the highly differ- 
entiated somatic cells into pluripotent cells [43], its 
expression in HESC was lower than that of many other 
KLFs, consistent with ENCODE/Caltech RNA-seq data 
available on UCSC Browser. As shown in Additional file 
1: Figure S2A, weak peaks of DHSs (in HESC, ESER, 
FLER, and PBER) were found to be dispersed in the 
KLF4 locus, which could account for its relatively lower 
expression than that of the other family members in the 
present and previous studies [44] and its dispensable 
role for the self-renewal and pluripotency of ES cells 
[43,45]. In particular, DHS peaks in the KLF4 promoter 
region tend to decrease during erythroid differentiation. 



which may explain the down-regulation of KLF4 expres- 
sion in erythroid cells compared with that in HESC 
(Figure 1). 

Among the 23 prominent erythroid-specific or puta- 
tive erythroid-specific DHSs, 18 (78%) were located 
upstream of TSSs or downstream of poly (A) sites of 
the KLF genes. Only four (17%) DHSs were proximal 
(< 2 kb) to TSSs, whereas 15 (65%) were distal (> 10 kb) 
to TSSs (Additional file 1: Figure S3 A). Our data on the 
identified erythroid-specific DHSs are comparable to 
those of a previous DNase-chip report on the distribu- 
tion of cell type-specific DHSs within 1% of the human 
genome from six diverse cell types [3]. In contrast, with 
the exception of the erythroid-specific DHSs in the 
KLF2 and KLF 17 regions (Additional file 1: Figure SIA, 
H), most erythroid-specific or putative erythroid-specific 
DHSs were present in the eight KLF genes that were 
up-regulated in erythroid cells (Figure 1, Figure 2, 
Additional file 1: Figure SIB-G). Moreover, erythroid- 
specific or putative erythroid-specific DHSs were absent 
in the genomic regions of several KLF genes (Additional 
file 1: Figure S2), which did not have increased expres- 
sion in erythroid cells (Figure 1), implying that these 
KLFs may not function in erythroid cells or that they 
were not activated by erythroid-specific enhancers. 

Approximately 65% erythroid-specific or putative 
erythroid-specific DHSs are enhancers 

Cell type-specific DHSs have been reported to act as en- 
hancers [3]. To identify which erythroid-specific DHSs 
can serve as enhancers, DLR assay was performed to 
evaluate the enhancer activity of 23 DHSs in driving 
TATA box- containing minimal promoter (minP) [46]. 
K562 cells are immature erythroid cells widely used in 
studies of erythroid differentiation or other functions of 
the erythroid lineage; K562 is also one of the tier 1 cell 
types used in the ENCODE project with massive data 
available for subsequent analyses. Therefore, we selected 
the K562 cells to identify erythroid-specific enhancers 
in vitro. We found that enhancer HS2 in human |3- 
globin LCR strongly activated minP in erythroid K562 
cells (Figure 3). Therefore, HS2 was chosen as the posi- 
tive control for enhancer activity evaluation in this assay 
as previous reports did [15,16,47]. HS2 activated minP 
by approximately 5 fold in this study. Therefore, we 
defined DHSs that could activate minP by 5 fold, which 
was 2.32 after log2 transformation as presented in 
Figure 3, or higher as enhancers. We found that 15 
(65%) erythroid-specific or putative erythroid-specific 
DHSs had enhancer activity, with the activity of some 
being much stronger than that of HS2 (Figure 3), 
suggesting that minP was sensitive enough and sufficient 
for evaluating enhancer activity. Eleven (73%) of the 15 
enhancers were located in the intergenic regions and 
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Figure 3 Enhancer identification in K562 ceils using the DLR assay with minP. (A) The constructs used in the reporter assay system are 
depicted. The pGL4.23 vector containing one minP was set as a negative control. The pGL4.23 construct with enhancer HS2 was set as a positive 
control. pGL4.23 constructs with DHSs of interest in the place of HS2 were used to evaluate the enhancer activities of DHSs. (B) Enhancer 
activities evaluation of 23 erythroid-specific or putative erythroid-specific DHSs in K562 cells. Each construct was transfected in triplicate at a time 
and transfections were repeated at least twice. For each construct, the firefly luciferase activity was normalized to that of Renilla luciferase. The 
relative luciferase activity was shown in log2 scale, with that of minP set to 0. Standard deviations were shown as error bars above each column. 
DHSs with 5-fold or higher activities than that of minP were defined as enhancers. 



four (27%) were in the introns of KLF genes (Additional 
file 1: Figure S3B). The higher frequency of enhancers in 
the intergenic regions is consistent with previous reports 
concerning enhancer distribution [3,48] and could also 
be explained by the fact that 83% DHSs were located 
beyond the proximal regions (>2 kb) in this study 
(Additional file 1: Figure S3 A). The statistical analysis 
for enhancer distribution (Additional file 1: Figure S3B) 
indicated that enhancers tend to be distal [3]. Taken 
together, the present findings demonstrated that ap- 
proximately two thirds of erythroid-specific or putative 
erythroid-specific DHSs were enhancers. Indeed, the 
majority of the DHSs were found to possess enhancer 
functions, supporting our original hypothesis that the 
high-throughput mRNA-seq and DHS mapping together 
provided a powerful mean for the identification of po- 
tential enhancers in the genome. 

More than half of erythroid-specific or putative erythroid- 
specific DHSs activate KLF promoters in erythroid cells 

In different cells, enhancers activate gene expression by 
interacting with corresponding promoters. Therefore, we 
evaluated the enhancer activities of all the 23 DHSs in 
driving their respective KLF-Ps (Additional file 2: Table 
S2) in K562 cells. In total, 10 KLF-Ps were cloned and 
their activities were examined using the DLR assay 
(Additional file 2: Table S3). HS2 only activated some 
KLF-Ps in K562, HeLa, and HEK293 cells, and thus was 
used as a positive reference in this assay (Figures 4 and 
5). DHSs that significantly {p < 0.01) increased the 



activities of corresponding KLF-Ps were considered as 
enhancers. Of the 23 DHSs, 15 (65%) displayed enhancer 
activities with respective KLF-Ps (Figure 4 and Table 1) 
in K562 cells. Importantly, some DHSs exhibited pro- 
moter specificity; for example, DHSs such as KLF6-I, 
KLFIO-I, KLF13-II, and KLF16-I were strong enhancers 
on minP, but failed to activate their own promoters. In 
contrast, KLF3-I, KLF6-II, KLF6-III, and KLF13-III dem- 
onstrated strong enhancer activities with KLF3, KLF6 or 
KLF 13 promoters, but not with minP, indicating that 
KLF3-I, KLF6-II, KLF6-III, and KLF13-III are gene-specific 
enhancers, and that their enhancer activities are independ- 
ent of the TATA box [49] (Table 1). The distribution of 
erythroid KLF enhancers is shown in Additional file 1: 
Figure S3C. 

Approximately > 67% KLF enhancers are erythroid 
specific 

To further investigate the erythroid specificity of KLF 
enhancers, we transiently transfected constructs with 
KLF-Ps into non-erythroid HeLa and HEK293 cells. 
DHSs that activated KLF-Ps in K562 cells but not in 
HeLa and HEK293 cells were considered as erythroid- 
specific KLF enhancers. DLR assay (Figure 5) revealed 
that 10 enhancers located in the genomic regions of 
KLFl-II, III, IV, and V; KLF6-II, III, and IV; KLF9-I and 
II; and KLF13-III were erythroid specific. Coincidentally, 
the mouse homologue of KLF 1 -II has been previously 
identified as an erythroid-specific enhancer [40]. These 
results provide strong evidence that the erythroid 
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Figure 4 Enhancer identification in K562 cells using the DLR assay with KLF-Ps. (A) The constructs used in the reporter assay system are 
depicted. The pGL4.10 constructs containing respective KLF-Ps only were set as negative controls. The pGL4.10 constructs with KLF-Ps and 
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bars above each column. Asterisks indicate the statistically significant differences between the addition of DHSs and the KLF-Ps only in K562 cells 
***p < 0.001, **p < 0.01. 



specificity of DHSs determines the erythroid specificity 
of enhancers. The distribution of these erythroid-specific 
KLF enhancers is shown in Additional file 1: Figure S3D. 

The nature of erythrold-specific KLF enhancers was 
valldated by bioinformatic analyses 

In general, enhancers are characterized by species con- 
servation [10], characteristic H3K4mel and H3K27ac 
enrichment [14], chromatin accessibility [DNase I hyper- 
sensitivity, Formaldehyde-Assisted Isolation of Regula- 
tory Elements (FAIRE) sensitivity] [2,8,50], and binding 
capacity of TFs and coactivators [14]. According to the 
UCSC Genome Browser, of the 10 erythroid-specific 
KLF enhancers identified in this study, only five were 
conserved in placental mammals [51], whereas all the 



10 enhancers were enriched with H3K4mel and/or 
H3K27ac modifications [52], seven of which were eryth- 
roid specific, and six were occupied by erythroid-specific 
TFs GATA-1 and/or NF-E2 in erythroid K562 cells [53] 
(Table 1), further supporting their proposed roles as 
erythroid-specific enhancers. 

The Txn Factor ChIP track synthesizes all the available 
ENCODE ChlP-seq data in different cell types; these 
data were used to build a full view of TF-binding sites 
(TFBSs) on the 23 DHSs. The signal strength of TF oc- 
cupancy was quantified as a cluster score ranging from 0 
to 1000. Because most (> 90%) TFBSs on the 23 DHSs 
occurred in K562 cells, we drew a heat map of cluster 
scores for this erythroid cell type after K-means cluster- 
ing. As shown in Figure 6A, class II clustered four non 
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analysis were performed as described in Figure 4. 



enhancers (3-II, 16-11, l-I, and 13-1), four minP-driving 
enhancers (16-1, 6-1, lO-I, and 13-11), three KLF P- 
driving enhancers (13-III, 6-II, and 3-1), and two minP- 
and KLF-P-dual driving enhancers (6-IV and 3-III). 
These 13 DHSs were bound by fewer TFs than the 10 
DHSs in class I and III, which could account for their 
impaired or deficient enhancer activity in K562 cells. Of 
the remaining 10 DHSs, with the exception of 6-III, all 
were enhancers on both promoters in K562 cells because 
of the binding of multiple TFs. Furthermore, ll-I, 17-1, 
and 2-1 were strong enhancers in K562 cells but were 
not erythroid specific because of the strong binding 
of TFs in other cell types. The remaining seven DHSs 
were erythroid-specific KLF enhancers. As shown in the 
vertical axis, TFs clustered to different classes: class I 
clustered several erythroid differentiation-related TFs 



(GATA-1, TALI, and GATA-2) [22], and class II 
clustered enhancer-related factors (BATF, c-Fos, JunD, 
MEF2A, p300, PU.l, NFKB, STATl, and SPl), histone 
modifiers and chromatin remodelers [HDAC2, p300, 
SETDBl, SIRT6, SWI/SNF components (Brgl and Inil), 
HMGN3, and Inil] [12]. The features of TFs mentioned 
above were obtained from the NCBI Reference Sequence 
(RefSeq) database (http://www.ncbi.nlm.nih.gov/refseq/). 
In summary, in the open chromatin state, binding of TFs 
largely determines the enhancer activity of DHSs. 

The ChlP-seq data of the UCSC Genome Browser did 
not cover all the TFBSs on the genome. Therefore, we 
performed de-novo motif analysis using the Multiple 
Em for Motif Elicitation (MEME) online software and 
annotated these motifs based on ENCODE-motifs by 
using TOMTOM [54]. ENCODE-motifs database covers 
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Table 1 Characteristics of erythroid-specific or putative erythroid-specific DHSs of ten human KLF genes 
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DHS conservation (Cs) was observed across 32 placental mammals, including liumans, cliimps, mice, and rabbits. The chromatin accessibility using FAIRE, 
enhancer-associated histone modifications (H3K4me1 and H3K27ac), and TFBSs for erythroid GATA-1 and NF-E2 on DHSs in erythroid K562 cells were extracted 
from UCSC Genome Browser. The enhancers characterized from the erythroid-specific or putative erythroid-specific DHSs were summarized in the last four 
columns. Y: yes. ^: this histone modification is K562 specific or putative K562 specific. 



all known motifs for each factor curated from 
TRANSFAC, Jaspar and Protein Binding Microarray 
(PBM) experiments and their enrichment within corre- 
sponding TF-binding experiments, as well as novel 
regulatory motifs discovered by systematic application of 
several motif discovery tools (including MEME, MDscan, 
Weeder, AlignACE) and evaluated based on their enrich- 
ment relative to control motifs within TP-bound regions 
(Kheradpour P, Kellis M: ENCODE-motifs: systematic 
analysis of regulatory motifs associated with transcription 
factor binding in the human genome, submitted). The con- 
served motifs among erythroid-specific KLF enhancers are 
shown in Figure 6B (E-value: 5.4 x 10'^) and Fig ure 6C 
(E-value: 2.6 x 10^); the p-values of their occurrence in 
DHSs were 1.03 x 10'^^-1.78 x 10 ^ and 4.91 x 10'^- 
3.14 X 10'^, respectively. Motif 1 (Figure 6B) could be 
the binding sites of histone modifier HDAC2 and 
enhancer-related TF foxa [55], and motif 2 (Figure 6C) 
could be the motifs of enhancer-related TPs SPl, 
FoxA2, PU.l [12], and so on. The annotated motifs of 



HDAC2, SPl, STAT, and PU.l were consistent with the 
TFBSs in cluster II of the heat map (Figure 6A). 

Erythroid-speclficlty of KLF enhancers is validated using 
the biochemical electrophoresis mobility shift assay 

Because the discovered motifs were mainly enhancer re- 
lated, we performed the electrophoresis mobility shift 
assay (EMSA) to verify the erythroid specificity of 
erythroid-specific KLF enhancers. KLF9-I was selected 
as an example. Data from the UCSC Genome Browser 
indicate that sequence conservation of KLF9-I was ob- 
served in neither vertebrates nor placental mammals, 
but it was decorated with H3K4mel and H3K27ac en- 
hancer marks (Table 1) and harbored binding sites of 
erythroid-specific GATA-1 in erythroid K562 cells 
(Figure 7A). Because individual TFBSs can be relatively 
short and degenerate, they tend to be clustered to 
achieve precise temporal and developmental stage speci- 
ficities [56]. Factors bound to these sequences often 
interact with common coactivators, which, in turn. 
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of TFs binding to the 23 DHSs in K562 cells from ChlP- 
enhancers using MEME. The matched motifs with 



recruit the basal transcription machinery [57,58]. We 
used EMSA to further investigate whether erythroid- 
specific TFs or cofactor complexes bind to KLF9-I 
in vitro. Putative TFBSs in KLF9-I were annotated using 
the Txn Factor ChIP track and ENCODE-motifs on 
UCSC Genome Browser (Figure 7A). Sequence analysis 
indicated that the motif of enhancer-related protein 
p300 [3,14,59] was embedded in KLF9-I (Figure 7A), 
which may account for the enhancer nature of KLF9-I, 



whereas the motif of GATA-1 may account for the 
erythroid specificity of KLF9-L A pair of oligos 
(shadowed in Figure 7A) against KLF9-I were designed 
using the bioinformatic analysis. The EMSA result 
(Figure 7B) demonstrated that at least one protein com- 
plex in erythroid K562 cell extracts (band IV) specifically 
bound to the oligos, whereas no such binding was ob- 
served in non-erythroid HeLa and HEK293 cell extracts, 
implying that an erythroid-specific TF or cofactor 
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Figure 7 Biochemical analysis of erythroid-specific KLF enhancers. (A) Potential TFBSs on KLF9-I. TF occupancies on the erythroid-specific 
KLF enhancer KLF9-I were annotated using Txn Factor ChIP track and ENCODE-motifs on the UCSC Genome Browser. (B) Result of EMSA analysis. 
Biotin-labeled oligos were incubated with nuclear protein extracts of K562, HeLa, or HEK293 cells in the presence or absence of unlabeled 
competing oligos. Shifted bands are indicated by arrowheads on the left. Band IV (red) was found to be erythroid specific and was only detected 
in K562 cells, but not in HeLa or HEK293 cells. 
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complex bound to KLF9-I enhancer and drives KLF9 ex- 
pression in erythroid cells. The other four universal 
bands suggested that KLF9-I may exert some universal 
functions in other cell types through recruitment of 
some basic transcriptional regulatory protein complexes. 
However, KLF9 has not been shown to function in eryth- 
roid cells. The detailed mechanisms by which KLF9-I 
mediates transcriptional regulation of KLF9 in erythroid 
cells need further investigation. 

Discussion 

Mapping regulatory elements to the genes they regulate 
is of great importance to understand gene expression 
and functions. In this study, we provide a feasible strat- 
egy to extensively identify gene- and cell-specific en- 
hancers from DHSs based on a rigorous and practical 
high-throughput sequencing technique (Figure 8). First, 
we refined the expression patterns of human KLFs from 
mRNA-seq data and proposed that the higher expression 
of eight KLFs in erythroid cells may be ascribed to the 
presence of erythroid-specific enhancers. Second, we 
screened erythroid-specific DHSs in the genomic regions 
of 17 human KLFs from DNase-seq dataset from four 
erythroid cell types and seven non- erythroid cell types, 
which largely improved the accuracy of prediction. 
Third, we extensively evaluated enhancer activities of all 
the 23 erythroid-specific or putative erythroid-specific 
DHSs using the DLR assay for promoter and cell-type 
specificities. Lastly, we validated the enhancer nature 
and erythroid specificity of erythroid-specific KLF en- 
hancers through bioinformatic and biochemical analyses. 

As the major contributors to cell- and developmental 
stage-specific gene expression, enhancers have been exten- 
sively predicted by many genome-wide approaches, includ- 
ing sequences conservation-, motif-, and chromatin-based 
computation methods [10,48,59]; ChlP-based analysis of 
TF binding such as CBP/p300 [13] or histone modifications 
(H3K4mel and H3K27ac) [14]; or DHSs and FAIRE map- 
ping [3,50]. The application of these approaches indicates 
that enhancers have unique properties that differentiate 
them from other CREs. However, enhancers predicted by 
these methods need to be validated further by using in vitro 
transient reporter gene system or in vivo transgenic sys- 
tems. It is unrealistic to characterize all cell- and gene- 
specific enhancers within the whole human genome. In 
addition, these methods can not simultaneously take into 
account both cell and gene specificities of enhancers. In this 
study, using the next-generation high-throughput sequen- 
cing combined with multiple enhancer activity tests, we 
comprehensively characterized the enhancers confined to 
the genomic regions of a typical family of KLFs to explore 
both cell and gene specificities. The expression data in 
mRNA-seq may indicate the presence of enhancers, and 
DNase-seq data suggest the location of these enhancers. 
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Figure 8 Methodology of identification and characterization of 
erythroid-specific KLF enhancers. 

V ) 

DHS mapping has higher positive rate than in-silico predic- 
tion methods and takes the advantage of unbiased property 
over ChlP-based methods, which only predict a subset of 
enhancers bound by one or several TFs and largely limited 
by the quality of antibodies. Reporter assays have the ad- 
vantages of saving labor and time over animal models. With 
respect to the identification of erythroid-specific DHSs, we 
used cell lines originated from inner cell mass (HESC), 
ectoderm (HeLa), mesoderm (GM12878 and hTH2), and 
endoderm (HepG2 and CAC02) as control cell types for 
erythroid-specific DHSs screening. However, it is possible 
that certain erythroid-specific DHSs were excluded due to 
the presentation of similar DHSs as a result of binding by a 
different set of TFs in these non-erythroid cell lines. This 
setback may be remedied by the employment of high- 
resolution genome-wide in vivo footprinting [60]. More- 
over, three primary erythroid cells were used for DHSs 
screening, which increased the validity of erythroid-specific 
DHSs. Although some subtle differences, but not specifi- 
city, were observed among DHSs at different developmen- 
tal stages of erythroid cells, the similarities are actually 
unclear (Figure 2, Additional file 1: Figures SI and S2). 
K562 cells are actually suitable for identifying erythroid- 
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specific enhancers. However, there appears to be a limita- 
tion in characterizing the developmental stage specificity of 
enhancers using this cell line because it is at an early stage 
of erythroid differentiation. In the future, we wish to 
analyze this using appropriate erythroid cell lines at differ- 
ent developmental stages or animal models. In reporter as- 
says, control cell lines HEK293 and HeLa are also 
representatives of different germ layers, mesoderm and 
ectoderm. Thus, the application of these non-erythroid 
control cell lines guaranteed the reliable and unbiased iden- 
tification of cell-specific enhancers using our system. Beside 
cell-type specificity, we also explored gene specificity of en- 
hancers, which is of great importance in gene expression 
and remains a challenge in this field. Furthermore, we vali- 
dated the credibility of the identified erythroid-specific KLF 
enhancers through integrated bioinformatic and biochem- 
ical approaches to determine sequence conservation, TP 
binding, and histone modification markers. Identification of 
erythroid-specific KLF enhancers will facilitate the under- 
standing of transcription, expression, and associated func- 
tions of KLF genes in erythroid cells and provide useful 
information to estimate the frequency of DHSs as gene- 
and cell-specific enhancers within the whole genome. With 
proper selection of cell types used as controls for DHS 
mapping and also employed for reporter assays, the ap- 
proach described in this study should be applicable to a 
wide range of cells and genes of interest. 

For example, a family of KLF members was selected to 
determine whether enhancers located in the genomic re- 
gions of KLFs were associated with their expression in 
erythroid cells. Here, we compared the expression of 
KLF mRNA transcripts in erythroid cells (ESER, FLER, 
and PEER) with those in non-erythroid cells (HESC) 
using mRNA-seq analysis. The present results reveal that 
eight KLFs (KLFl, KLFS, KLF6, KLF9, KLFIO, KLFll, 
KLF 13, and KLFl 6) show relatively higher expressions 
in erythroid cells than those in non-erythroid cells 
(Figure 1), which was proposed to be mainly determined 
by the presence of erythroid-specific CREs, particularly 
enhancers, embedded in DHSs of their genomic regions 
(Figure 2, Additional file 1: Figure SI, Additional file 2: 
Table SI). Ten erythroid-specific KLF enhancers were fi- 
nally identified in the genomic regions of KLFl, KLF6, 
KLF9, and KLFl 3, indicating that these KLFs were 
erythroid-specific enhancers -driven genes and harbored 
potential biology in erythroid cells. However, despite 
their high expression in erythroid cells, no erythroid- 
specific enhancer was identified in the genomic regions 
of KLF3, KLFIO, KLFll, and KLF16. The enhancers in 
the genomic regions of KLF3 and KLFll were excluded 
because they demonstrated enhancer activities in non- 
erythroid cells as well (Figures 4 and 5, and Table 1). For 
KLFIO or KLF 16, although the identified erythroid- 
specific or putative erythroid-specific DHSs in their 



genomic regions activated minP in K562 cells, no enhan- 
cer activating KLF-Ps was characterized, reflecting that 
DHSs identified in their genomic regions may work as 
promoter-specific enhancers. This is understandable be- 
cause neither KLFIO nor KLFl 6 have been reported to 
be involved in erythroid differentiation and develop- 
ment. However, the lack of erythroid-specific enhancers 
in some KLFs with high erythroid expression may be 
caused by the limitation and bias of our arbitrary cut- 
offs of genomic regions. Some erythroid-specific KLF en- 
hancers could exist beyond these confined regions (ap- 
proximately 100 kb) or even on different chromosomes. 
These missed enhancers can be found by combining our 
system with the genome-wide chromosome conform- 
ation capture (3C)-base technology in the future, which 
may help in comprehensively understanding KLF tran- 
scription in erythroid cells. DHSs in the genomic region 
of KLF2 and KLFl 7 did not appear to show erythroid- 
specific enhancer activity, which may account for their 
abolished expressions in erythroid cells during mRNA- 
seq analysis (Figure lA). In contrast, other KLFs (KLF4, 
KLFS, KLF7, KLFS, KLF12, and KLF15) were relatively 
highly expressed in non-erythroid cells (Figure 1), indi- 
cating that they were probably not erythroid-specific 
genes and that their genes expression could be driven by 
non-erythroid specific or universal CREs (Additional file 
1: Figure S2A-E, G). No erythroid-specific or putative 
erythroid-specific DHSs were identified in the genomic 
region of KLF 14 (Additional file 1: Figure S2F), which 
could explain why its mRNA transcript could not be 
detected in cell types examined in this study. 

In this study we identified ten novel erythroid-specific 
enhancers in the genomic regions of human KLFs 
(KLFl, KLF6, KLF9, and KLF13), of which KLFl-II, a 
homolog of murine EHSl, acted as an erythroid-specific 
enhancer [40]. Identification of erythroid-specific KLF 
enhancers may reveal novel mechanisms that regulate 
the transcription and functions of these KLFs in eryth- 
roid cells. Till date, KLFl, KLF6, and KLF13 have been 
identified to be critical regulators in erythroid cells 
[30,33,35,37]. KLF9 (BTEBl) is a broadly expressed TF 
with high expression in the developing brain, thymus, 
epithelia, smooth muscle of gut and bladder, vertebrae, 
and cartilage primordial and is implicated to play a role 
in the regulation of cell proliferation and differentiation 
[61]. Klf9~'~ mice have a normal lifespan, but impaired 
specific behavioral activities and decreased small 
intestinal villi [62,63]. KlfT'^ female mice show uterine 
hypoplasia, reduced litter size, increased incidence of 
neonatal deaths in offspring and defects in parturition 
[64]. At present, the biology of KLF9 in erythroid cells 
has not been reported; thus, the identification of 
erythroid-specific KLF9 enhancers implies that KLF9 
could be a novel KLF member that may play critical 
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roles in erythroid cells. Future studies are warranted to 
investigate the functions of KLF9 in hematopoiesis, and 
the mechanisms by which the two identified erythroid- 
specific KLF9 enhancers regulate KLF9 gene transcrip- 
tion and expression in erythroid cells. 

Conclusions 

The primary goal of this study is to develop a method- 
ology to characterize enhancers from massive data gen- 
erated by high-throughput sequencing technology. Using 
the high-throughput sequencing technique, we have pro- 
vided a feasible and practical strategy to extensively 
identify gene- and cell-specific enhancers from DHSs. 
Application of our strategy led to the identification of 
ten erythroid-specific enhancers in the typical KLF fam- 
ily; their enhancer nature and erythroid specificity were 
confirmed using bioinformatic and biochemical analyses. 
Identification of erythroid-specific KLF enhancers indi- 
cates the relatively high expressions and some important 
functions of the corresponding KLFs in erythroid tissues. 

Methods 

Cell culture 

K562, HeLa, and HEK293 cells were used for transient 
transfection to examine the enhancer activities in the 
DLR assay. K562 cells were cultured in RPMI1640 
Medium (Gibco) with 10% fetal bovine serum (Hyclone) 
and penicillin (100 U/ml) -streptomycin (0.1 mg/ml) 
(Invitrogen), and HeLa and HEK293 cells were cultured 
in Dulbecco s Modified Eagle Medium (Gibco) with 10% 
fetal bovine serum (Hyclone) and penicillin (100 U/ml)- 
streptomycin (0.1 mg/ml) (Invitrogen). All cells were 
maintained at 37 °C with 5% CO2 in a humidified 
incubator. 

Transcriptome sequencing and gene expression analysis 

mRNA-seq was originally designed to explore the dy- 
namic transcriptomes during human erythroid differen- 
tiation and development (Yang Y, Wang H, Chang KH, 
Qu H, Zhang Z, Xiong Q, Qi H, Cui P, Lin Q, Ruan X, 
et al: Transcriptome dynamics during human erythroid 
differentiation and development, submitted). In Brief, we 
extracted total RNA from HESC, ESER, FLER, and 
PEER, and depleted 18S and 28S ribosomal RNAs before 
constructing cDNA libraries. Next, we used the ABI 
SOLID System to perform massively parallel ligation se- 
quencing and mapped the sequence reads to human ref- 
erence sequence [release Mar. 2006 (NCBI36/hgl8)]. 
Gene expression intensity was calculated by normalizing 
the read counts to RPKM according to the gene length 
and total mapped reads, and genes with RPKM < 0.01 
were removed. 



Quantitative real-time PCR 

Total RNA was extracted from HESC, ESER, FLER, and 
PEER cells using TRIZOL" Reagent (Invitrogen, 15596- 
018) and DNA contamination was removed using the 
TURBO DNA-free™ Kit (Ambion, AM1907). DNA-free 
RNA was reverse transcribed using the RevertAid First 
Strand cDNA Synthesis Kit (Thermo Scientific, K1622) 
according to manufacturers instruction. Primers were 
designed using Primer 5 (Additional file 2: Table S4). 
PCR were performed in triplicate using Maxima® SYBR 
Green/ROX qPCR Master Mixes (2x) (Fermentas, 
K0223) and CFX96™ Real-Time PCR Detection System 
(Bio-rad), and data were analyzed using the CFX 
Manager™ Software. The KLF transcript levels were first 
calculated by referring to those of 18S ribosomal RNA, 
and the expression levels of KLFs in ESER, FLER, and 
PBER were normalized to those in HESC. The statistical 
significance of differences between individual KLFs' ex- 
pressions in erythroid cell types and those in HESC were 
calculated using the independent-samples ^-test. 

Digital DNase I sequencing and erythroid-specific or 
putative erythroid-specific DHS selection 

In this study, the DNase-seq data used were obtained 
from the University of Washington [41,60], and 
are available through the UCSC Genome Browser 
(http://genome.ucsc.edu) and the NCBI Gene Expres- 
sion Omnibus (GEO) data repository under accessions 
GSE29692 and GSE32970. DHSs were identified 
using an algorithm developed by the University of 
Washington [65]. In this study, FDR threshold of 0.5% 
was used to define DHS for each cell type. The do- 
mains of KLF loci were defined as extensions from 
70 kb upstream of TSSs to 20 kb downstream of the 
poly (A) sites. ESER, FLER, and PBER cells represent 
primary erythroid cells at different developmental 
stages, and K562 cells represent erythroleukemia cells. 
In DHS screening, all other cell types were employed 
as non-erythroid control cell types. DHSs in these do- 
mains were considered to be erythroid specific if they 
were only present in erythroid cells and were identified 
as putative erythroid specific if they were present in 
erythroid cells and exhibited much lower peaks in one 
or two non-erythroid cell types. 

DNA manipulation 

To generate firefly luciferase reporter constructs with 
minP, the identified 23 erythroid-specific or putative 
erythroid-specific DHSs were amplified from human 
blood genomic DNA with Pfu DNA Polymerase 
(Promega, M7741) and inserted upstream of minP in 
the pGL4.23 expression vector (Promega, E8411). To 
further generate firefly luciferase reporter constructs 
with KLF-Ps, TSSs of individual KLFs were predicted 
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from UCSC Genome Browser, and fragments of 
approximately 1 kb in length upstream of TSSs 
(Additional file 2: Table S2) were amplified and cloned 
into pGL4.10 vector, followed by inserting DHSs up- 
stream of the corresponding KLF-Ps. The activities of 
KLF-Ps were examined and used as baselines. HS2, a 
classical enhancer in |3-globin LCR, was cloned into 
the corresponding vectors upstream of minP or KLF- 
Ps and used as positive controls in the DLR assay [16]. 
The primers used in this study are listed in Additional 
file 2: Tables S5 and S6. The integrity of the reporter 
constructs was confirmed using restriction digestions 
and sequencing. 

Transient transfection and DLR assay 

Cells were seeded into 48-well plates. K562 cells (1.5 x 
10^/ well) were transiently transfected with 500 ng of 
firefly luciferase vector and 0.75 ng of a Renilla 
luciferase vector, pRL-TK (Promega, E2441), using 
Lipofectamine LTX and Plus Reagent (Invitrogen, 
15338-100). HeLa (7 x loWll) and HEK293 (7 x 
10^/ well) cells were similarly transfected using the 
Lipofectamine 2000 Transfection Reagent (Invitrogen, 
11668-019) as per the manufacturers instructions. 
Forty-eight hours after transfection, cells were harvested 
to prepare for the cell lysates, and luciferase activities 
were immediately measured with the Dual-Luciferase 
Reporter Assay System (Promega, E1910) as per the 
manufacturer s instructions. Transient transfections were 
repeated at least twice, and every construct was 
transfected in triplicates. Standard deviations were 
shown as error bars above respective columns. For data 
processing, firefly luciferase activity was normalized to 
that of Renilla luciferase in all the groups, and the rela- 
tive activity of each promoter was normalized as 1. The 
statistical significance of differences between promoters 
and DHSs were analyzed using one-way ANOVA func- 
tion in R language. 

Bioinformatic analyses 

Data of conserved elements in placental mammals [51], 
layered H3K4mel and layered H3K27ac [52], and Txn 
Factor ChIP data [53] in the regions of 23 DHSs were 
obtained from UCSC Genome Browser and summarized 
in Table 1. 

TFs binding to the 23 DHSs were collected from Txn 
Factor ChIP track on UCSC Genome Browser. A heat 
map was drawn after K-means clustering using R 
language. 

Conserved motifs embedded in erythroid-specific KLF 
enhancers were analyzed using the MEME software 
(http://meme.ebi.edu.au/meme/cgi-bin/meme.cgi), and 
these de novo discovered motifs were searched against the 
ENCODE-motifs database (http://www.broadinstitute. 



org/'-pouyak/motif-disc/human/) using TOMTOM 
algorithm (http://meme.ebi.edu.au/meme/cgi-bin/tomtom. 
cgi) [54]. 

EMSA 

Bio-ll-dUTP (Ambion, AM8450) and TdT (New England 
Biolabs, M0315s) were used to label the 3' -OH of single- 
stranded oligos (5'-AGC ATG AAG TAG GAG AGT GAT 
GAT GAC AGT GCT GCT TTG CAC AGA TAA GCC 
TGG CGG A-3', 5'-TCC GCC AGG CTT ATC TGT 
GCA AAG CAG CAC TGT CAT CAT CAC TCT CCT 
ACT TCA TGC T-3') and complementary oligos were 
annealed as per the manufacturers instructions. Nuclear 
proteins of K562, HeLa, and HEK293 cells were extracted 
using the rapid micro-preparation method (lysis buffer: 
10 mM Hepes [pH7.9], 10 mM KCl, 1.5 mM MgCl2, 
0.5 mM PMSF, 0.5 mM DTT; high-salt extraction buffer: 
20 mM Hepes [pH7.9], 25% glycerol, 0.42 M NaCl, 1.5 mM 
MgCl2, 0.2 mM EDTA, 0.5 mM PMSF, 0.5 mM DTT) [66]. 
Protein concentrations were measured using the BCA Pro- 
tein Assay Kit (Pierce, 23225). EMSA was performed in 
20 (il reaction mixture, containing 30 fmol biotin-labeled 
oligos and 7 (ig nuclear extract with or without 7.2 pmol 
unlabeled oligos using the LightShift Chemiluminescent 
EMSA Kit (Pierce, 20148) according to the manufacturers 
instruction (Figure 7B). 

Availability of supporting data 

DNase-seq data are available through the UCSC Gen- 
ome Browser (http://genome.ucsc.edu/cgi-bin/hgFileUi? 
db=hgl9&g=wgEncodeUwDnase), or through the NCBI 
Gene Expression Omnibus (GEO) data repository (ac- 
cession numbers: GSE29692, GSE32970). The other data 
sets supporting the results of this article are included 
within the article and the additional files. 

Additional files 



Additional file 1: Figure 51. Chromatin profiles of KLF genes 
containing erythroid-specific (arrow and column in green) or putative 
erythroid-specific (arrow and column in blue) DHSs. KLF loci were 
arbitrarily defined as extension from 70 kb upstream of the TSSs to 20 kb 
downstream of the poly (A) sites. Erythroid-specific or putative erythroid- 
specific DHSs were respectively marked with green and blue arrowheads 
and named with Roman numbers. Figure S2. Chromatin profiles KLF 
genes without erythroid-specific or putative erythroid-specific DHSs. 
Figure S3. Distribution statistics of the identified erythroid-specific or 
putative erythroid-specific DHSs and enhancers in the genomic regions 
of KLFs. A. Statistics of the distribution of identified erythroid-specific or 
putative erythroid-specific DHSs relative to KLF genes and TSSs. In total, 
18 (78%) and five (22%) DHSs are localized to the intergenic and intronic 
regions, respectively; 15 (65.2%) DHSs are located far distal (>10 kb) to 
TSSs, four (17.4%) DHSs are located distal (2-10 kb) to TSSs, and four 
(17.4%) DHSs are located in proximal (<2 kb) promoter regions. DHS 
KLFl-lll contains TSS (Additional file 1: Table SI). B. Statistics of the 
distribution of the identified enhancers under the control of minP in 
K562 cells relative to KLF genes and TSSs. C. Statistics of the identified 
erythroid KLF enhancer distribution relative to the KLF genes and TSSs. 
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D. Statistics of the identified erythroid-specific KLF enhancer distribution 
relative to KLF genes and TSSs. 

Additional file 2: Table SI. The relative position and length of 23 
identified erythroid-specific or putative erythroid-specific DHSs. 
Table S2. Position, length, GC content, chromatin accessibility (DNase I 
hypersensitivity), and references often KLF promoters used in enhancer 
assays. Table S3. Activities of KLF promoters measured by the luciferase 
reporter assay. Table S4. Primers used for measuring the expression patterns 
of KLFs in real-time PGR. Table S5. Primers used for amplification of erythroid- 
specific DHS fragments inserted in luciferase reporter constructs. Table S6. 
Primers used for amplification of KLF promoters. Supplementary References. 
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